Bank Customer Churn Prediction

TL;DR

Random Forest classifier on 10,000 customers with 80:20 class imbalance (non-churn:churn). Compared SMOTE vs class-weight adjustment — SMOTE improved precision by 12 pp, class-weight improved recall by 18 pp. Best overall model achieved F1 = 0.76 on churn class. Feature importance: age, balance, credit score are top three predictors.

10,000 Customers

80:20 Class Imbalance

F1 = 0.76 (Churn Class)

Random Forest

Python Machine Learning Random Forest SMOTE Classification Class Imbalance

Project Overview

Customer churn in banking is a direct revenue loss: acquiring a new customer costs 5–7× more than retaining an existing one. This project builds a churn prediction system on 10,000 bank customers to identify high-risk customers before they leave — enabling the bank to deploy targeted retention interventions.

The dataset has an 80:20 class imbalance (8,000 non-churners : 2,000 churners). Without addressing this, a naive classifier achieves 80% accuracy simply by predicting "no churn" for everyone — useless for the actual business problem. The core engineering challenge was choosing the right imbalance-handling strategy depending on business priority: catching more churners (recall) vs avoiding false alarms (precision).

Model Comparison: SMOTE vs Class-Weight Adjustment

Approach	Precision (Churn)	Recall (Churn)	F1 (Churn)	Best For
No imbalance handling	0.61	0.48	0.54	—
SMOTE oversampling	0.73	0.66	0.69	Minimising false positives (targeted retention spend)
Class weight adjustment	0.61	0.84	0.71	Maximising recall (catching the most churners)
SMOTE + Tuned threshold	0.68	0.78	0.76	Best overall balance — recommended

The best-performing configuration combined SMOTE oversampling with threshold tuning (decision threshold lowered from 0.5 to 0.38), achieving F1 = 0.76 on the churn class — a 41% improvement over the baseline.

Key Insights

Class imbalance is a business problem, not just a technical one — the right approach depends on whether the bank prioritises precision (targeted spend) or recall (catching all churners). There is no universally "best" answer.
Age (35–50), account balance, and credit score below 600 are the top three churn predictors — actionable signals for proactive outreach.
Germany-based customers churn at 2× the rate of French and Spanish customers — suggesting regional service or product gaps worth investigating.
Customers with 1 product churn at significantly higher rates than those with 2+ products — cross-selling is a directly addressable retention lever.
A customer flagged as high churn risk with an estimated lifetime value above £5,000 justifies a personalised retention call; below that threshold, an automated email campaign is more cost-effective.

Technical Implementation

Preprocessing & Feature Engineering:

Handled missing values and outliers (balance had right-skewed distribution — log transform applied).
Encoded categorical variables: Geography (one-hot), Gender (binary).
StandardScaler applied to numerical features for consistency with downstream SMOTE.
Created interaction feature: balance_to_salary_ratio — stronger predictor than either alone.

Modelling Approaches:

Random Forest + SMOTE: SMOTE applied only to training set (not test — critical to avoid data leakage).
Random Forest + Class Weight: class_weight='balanced' in sklearn, which internally applies inverse-frequency weighting.
5-fold stratified cross-validation on all configurations to ensure consistent evaluation across imbalanced folds.
Hyperparameter tuning via RandomisedSearchCV: n_estimators, max_depth, min_samples_split.

Key Learnings

SMOTE must only be applied to training data — applying it before train/test split inflates evaluation metrics because synthetic samples near test points artificially improve test performance. This is a common mistake and a red flag in DS interviews.
Threshold tuning is underutilised — most projects stop at 0.5 threshold. Moving the threshold to 0.38 improved F1 by 7 points without changing the model at all, purely by redefining what constitutes a "positive" prediction.
Feature importance from Random Forest is a starting point, not a conclusion — correlated features can dilute each other's importance scores. SHAP values would give a more accurate and model-agnostic attribution.

Future Work

Add SHAP values for model interpretability — Random Forest feature importance doesn't account for feature correlation, which skews attribution for correlated predictors.
Build a simple customer LTV model to weight churn predictions by business impact — a high-probability churn on a low-value customer is less urgent than a moderate-probability churn on a high-value one.
Evaluate gradient boosting (XGBoost/LightGBM) — they typically outperform Random Forest on structured tabular data with class imbalance.

GitHub

Built by Om Patel — ML Engineer & Data Scientist.
Explore more projects on my Portfolio.